SIMP59: Data Selection and Visualisation
7.5 credits VT25
This lecture recaps RMarkdown notebooks and the use of dplyr pipes to streamline data analysis workflows. We will explore univariate data visualizations, focusing on how to effectively visualize distributions using ggplot2. Participants will learn how to construct ggplot2 calls to create clear and informative plots, including bar charts for categorical variables and histograms for numerical variables. The session will cover the concept of frequency in data representation and demonstrate how to interpret distribution patterns to gain insights from data.
We will also explore bivariate data visualizations to analyze relationships between variables. We will cover techniques for visualizing relationships between one numerical dependent variable and one categorical independent variable, as well as methods for comparing two categorical or two numerical variables. Participants will learn how to represent amounts and proportions effectively and use x–y plots to examine trends and correlations. The session will also address visualizing uncertainty in data and best practices for interpreting variability in relationships. Finally, we will demonstrate how to save plots for reporting and presentation purposes.
Figure 1: In this section of the book, you’ll learn how to import, tidy, transform, and visualize data.
20 Spreadsheets 21 Databases 22 Arrow 23 Hierarchical data
12 Logical vectors 13 Numbers 14 Strings 15 Regular expressions 16 Factors 17 Dates and times 18 Missing values 19 Joins
Figure 2: The column names of pivoted columns become values in a new column. The values need to be repeated once for each row of the original dataset.
Data collection (nov 12)
Exam question 1
Data analysis (nov 26)
Exam question 2
Workshop 2, dec 2
References